Data Mining of Web Access Logs
نویسنده
چکیده
Analysis of web visitors access patterns can lead to benefits in a wide range of areas such as decision support and website restructuring. Data mining techniques can be used to find access patterns hidden inside huge volumes of web access data. The goal of this thesis is to determine whether there are any such patterns in the web access data for the computer science website of RMIT university. In particular, this thesis investigates whether there are any differences in access patterns between: (1) Visitors from within Australia and visitors from outside Australia. (2) Visitors from within RMIT university and visitors from outside RMIT university. (3) Visitors from within RMIT university and visitors from outside RMIT university but within Australia. (4) Visitors from educational institutions other than RMIT university and visitors from non-educational institutions. The data mining techniques of classification, association rules, clustering and attribute selection were used with four different feature sets. The entire pattern discovery process was divided into three major steps: (1) Transaction identification and feature extraction (2) Discovery of the access patterns. (3) Analysis of the discovered patterns for their interestingness. Three major patterns were discovered: (1) Visitors from Australia generally visit the root page while visitors from outside Australia do not. The most likely reason for this is that visitors from outside Australia use search engines that direct them to specific pages. However, some (2) Visitors from outside Australia visit the root page and pages about post graduate programs (such as Master of Technology). This suggests that these visitors are mostly interested in post graduate studies. (3) Visitors from other educational institutions tend to visit pages related to staff contact information while other visitors tend to access career and industry related information. During the course of the investigation, it was found that there were a significant number of long transactions. The long transactions were analysed manually and it was found that visitors in a significant number of transactions access information about different programs offered and towards the end of their visit they look for the information brochure of one program. The significance of the patterns discovered during this thesis work suggests that data mining techniques with suitable feature sets can produce very interesting patterns.
منابع مشابه
Online and Incremental Mining of Separately-Grouped Web Access Logs
The rising popularity of electronic commerce makes data mining an indispensable technology for business competitiveness. The World Wide Web provides abundant raw data in the form of web access logs, web transaction logs and web user profiles. Without data mining tools, it is impossible to make any sense of such massive data. In this paper, we focus on web usage mining because it deals most appr...
متن کاملMining Access Patterns Eeciently from Web Logs ?
With the explosive growth of data available on the World Wide Web, discovery and analysis of useful information from the World Wide Web becomes a practical necessity. Web access pattern, which is the sequence of accesses pursued by users frequently, is a kind of interesting and useful knowledge in practice. In this paper, we study the problem of mining access patterns from Web logs e ciently. A...
متن کاملتشخیص ناهنجاری روی وب از طریق ایجاد پروفایل کاربرد دسترسی
Due to increasing in cyber-attacks, the need for web servers attack detection technique has drawn attentions today. Unfortunately, many available security solutions are inefficient in identifying web-based attacks. The main aim of this study is to detect abnormal web navigations based on web usage profiles. In this paper, comparing scrolling behavior of a normal user with an attacker, and simu...
متن کاملMining the Most Interesting Web Access Associations
Web access patterns can provide valuable information for website designers in making website-based communication more efficient. To extract interesting or useful web access patterns, we use data mining techniques which analyze historical web access logs. In this paper, we present an efficient approach to mine the most interesting web access associations, where the word "interesting" denotes pat...
متن کاملKnowledge Discovery from Web Usage Data: Research and Development of Web Access Pattern Tree Based Sequential Pattern Mining Techniques: A Survey
Sequential pattern mining is the process of applying data mining techniques to a sequential database, to extract frequent subsequences to discover correlation that exists among the ordered list of events. Web Usage mining (WUM) discovers and extracts interesting knowledge/patterns from Web logs is one of the applications of Sequential Pattern Mining. In this paper, we present a survey of the se...
متن کاملEffective web log mining and online navigational pattern prediction
The web has become the world's largest repository of knowledge. Web usage mining is the process of discovering knowledge from the interactions generated by the user in the form of access logs, cookies, and user sessions data. Web Mining consists of three different categories, namely Web Content Mining, Web Structure Mining, and Web Usage Mining (is the process of discovering knowledge from the ...
متن کامل